Efficient Document Clustering via Online Nonnegative Matrix Factorizations
نویسندگان
چکیده
In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very large-scale and/or streaming datasets. Unlike conventional NMF solutions which require the entire data matrix to reside in the memory, our ONMF algorithm proceeds with one data point or one chunk of data points at a time. Experiments with one-pass and multi-pass ONMF on real datasets are presented.
منابع مشابه
Subtractive Initialization of Nonnegative Matrix Factorizations for Document Clustering
Nonnegative matrix factorizations (NMF) have recently assumed an important role in several fields, such as pattern recognition, automated image exploitation, data clustering and so on. They represent a peculiar tool adopted to obtain a reduced representation of multivariate data by using additive components only, in order to learn parts-based representations of data. All algorithms for computin...
متن کاملFast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations
Nonnegative matrix factorization (NMF) and its extensions such as Nonnegative Tensor Factorization (NTF) have become prominent techniques for blind sources separation (BSS), analysis of image databases, data mining and other information retrieval and clustering applications. In this paper we propose a family of efficient algorithms for NMF/NTF, as well as sparse nonnegative coding and represent...
متن کاملAlgorithms for Nonnegative Tensor Factorization
Nonnegative Matrix Factorization (NMF) is an efficient technique to approximate a large matrix containing only nonnegative elements as a product of two nonnegative matrices of significantly smaller size. The guaranteed nonnegativity of the factors is a distinctive property that other widely used matrix factorization methods do not have. Matrices can also be seen as second-order tensors. For som...
متن کاملNonnegative Matrix Factorization with Orthogonality Constraints
Nonnegative matrix factorization (NMF) is a popular method for multivariate analysis of nonnegative data, the goal of which is to decompose a data matrix into a product of two factor matrices with all entries in factor matrices restricted to be nonnegative. NMF was shown to be useful in a task of clustering (especially document clustering), but in some cases NMF produces the results inappropria...
متن کاملTensor Decompositions: A New Concept in Brain Data Analysis?
Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), Nonnegative Matrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions hav...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011